Introduction of the Problem
Statistics for Testing Equality of High Dimensional Covariance Matrices
Dimension Reduction via the Singular Value Decomposition
Simulation Study Setup
Results
Summary
Future Work
References
Introduction of the Problem
Statistics for Testing Equality of High Dimensional Covariance Matrices
Dimension Reduction via the Singular Value Decomposition
Simulation Study Setup
Results
Summary
Future Work
References
Considering high-dimensional data
Five test statistics for testing equality of covariance matrices
Two dimension reduction methods via singular value decomposition
Interested in comparing equality of 2 and \(k\) covariance matrices
Let \(\boldsymbol{\Sigma}_i \in \mathbb{R}_{p \times p}^>\) be the covariance matrix of the \(i^\text{th}\) population for \(i = 1, 2, \dots, k\).
We wish to test
\[H_0 : \boldsymbol{\Sigma}_1 = \ldots = \boldsymbol{\Sigma}_k.\] - Our sample covariance matrices, \(\boldsymbol{S}_1, \ldots \boldsymbol{S}_k\), are distributed \(n_i \boldsymbol{S}_i \sim W_p(\boldsymbol{\Sigma}_i, n_i)\).
\[M = n \log |\boldsymbol{S}| - \sum \limits^k_{i=1} n_i \log |\boldsymbol{S}_i| \xrightarrow{d} \chi^2\]
This modified likelihood ratio test is only valid when \(\boldsymbol{S}_i\) is nonsingular or when \(n_i \gg m\).
\[W = \frac{n}{2} \left \{ \sum \limits^k_{i = 1}\frac{n_i}{n} tr \left(\boldsymbol{S}_i\boldsymbol{S}^{-1}\boldsymbol{S}_i\boldsymbol{S}^{-1}\right) - \sum \limits^k_{i=1}\sum \limits^k_{j = 1}\frac{n_in_j}{n^2}tr \left(\boldsymbol{S}_i\boldsymbol{S}^{-1}\boldsymbol{S}_j\boldsymbol{S}^{-1}\right) \right \} \xrightarrow{d} \chi^2\]
This Wald test is only valid when \(\boldsymbol{S}\) is nonsingular or when \(n \gg m\).
\[ d^2 = \frac{tr \left( \boldsymbol{\Sigma}_i - \boldsymbol{\Sigma}_j \right)^2 }{p} = \frac{tr \left( \boldsymbol{\Sigma}_i^2 \right) }{p} + \frac{tr \left( \boldsymbol{\Sigma}_j^2 \right) }{p} - \frac{2tr \left( \boldsymbol{\Sigma}_i \boldsymbol{\Sigma}_j \right) }{p}. \]
Dividing by \(p\) is not typical but allows the norm of the Identity to be 1.
The norm is invariant to rotation.
\[ T_{Sc} = \sum \limits^{k}_{i < j} \frac{ \left( \hat{a}_{2i} + \hat{a}_{2j} - \frac{2}{p}tr \left( S_i S_j \right) \right) ^ 2}{\theta} \xrightarrow{d} \chi^2\]
\[\theta = 4 \hat{a}_2^2 \left( \sum \limits_{i < j}^ k \left( \frac{1}{n_i} + \frac{1}{n_j} \right) + (k - 1)(k - 2) \sum \limits_{i = 1}^k n_i^{-2} \right)\]
\[\hat{a}_{2i} = \frac{tr \left( \boldsymbol{V}_i^2 \right) - \frac{1}{n_i}tr \left( \boldsymbol{V}_i \right)^2}{ \left( n_i - 1 \right) \left(n_i + 2 \right)p} \xrightarrow{P} \frac{tr \left( \boldsymbol{\Sigma}_i^2 \right) }{p}\]
\[\hat{a}_2 = \frac{tr \left( \boldsymbol{V}^2 \right) - \frac{1}{n}tr \left( \boldsymbol{V} \right)^2 }{(n - 1)(n + 2)p} \xrightarrow{P} \frac{tr \left( \boldsymbol{\Sigma}^2 \right) }{p}\]
\[T_{C} = \sum \limits^{k}_{i < j} \frac{ \left( \hat{b} - 1 \right)^2 }{\hat{\delta}^2} \xrightarrow{d} \chi^2\]
\[\hat{b} = \frac{\hat{a}_{2i}}{\hat{a}_{2j}} \quad \quad \hat{\delta}^2 = 4 \left( \frac{2\hat{a}_4}{p \hat{a}_2^2} \sum \limits^k_{i=1} \frac{1}{n_i -1} + \sum \limits^k_{i=1} \frac{1}{ \left( n_i - 1 \right)^2 } \right)\]
\[ T_{S14} = \sum \limits^{k}_{i < j} \frac{ \left( \hat{a}_{2i} + \hat{a}_{2j} - \frac{2}{p}tr \left( \boldsymbol{S}_i \boldsymbol{S}_j \right) \right) ^ 2}{\theta} \xrightarrow{d} \chi^2\]
\[\hat{a}_{2i} = \frac{ \left(n_i -2 \right) \left( n_i -1 \right) tr \left( \boldsymbol{V}_i^2 \right) - n \left( n - k \right) tr \left( \boldsymbol{D}^2_i \right) + tr \left( \boldsymbol{V}_i \right)^2 }{pn_i \left( n_i -1 \right) \left( n_i -2 \right) \left( n_i -3 \right) } \xrightarrow{P} \frac{tr \left( \boldsymbol{\Sigma}^2 \right) }{p}\]
\[T_I = \prod \limits^{k}_{i < j} \tilde{\lambda}_* \tilde{h}_* \tilde{\gamma}_* \xrightarrow{d} F\]
\[\tilde{\lambda}_* = \frac{max(\tilde{\lambda}_{i1}, \tilde{\lambda}_{j1})}{min(\tilde{\lambda}_{i1}, \tilde{\lambda}_{j1})}\]
\[\tilde{\lambda}_{i1} = \hat{\lambda}_{i1} - \frac{tr(S_i) - \hat{\lambda}_{i1}}{n_i - 2}\]
\[\tilde{h}_* = max(|\tilde{h}^T_i \tilde{h}_j|, |\tilde{h}^T_i \tilde{h}_j|^{-1})\]
\[\tilde{h}_i \]
\[\tilde{\gamma}_* = max(\frac{\tilde{\kappa}_i}{\tilde{\kappa}_j}, \frac{\tilde{\kappa}_j}{\tilde{\kappa}_i})\]
\[\tilde{\kappa}_i = tr(S_{i}) - \tilde{\lambda}_{i1}\]
For samples \(\boldsymbol{X}_i\), let \(\boldsymbol{X} = \left[ \boldsymbol{X}_1 \vdots \ \ldots \ \vdots \boldsymbol{X}_i\vdots \ \ldots \ \vdots \boldsymbol{X}_k \right]^T\).
Let \(\boldsymbol{M}\) represent the scatter matrix of \(\boldsymbol{X}\)
Let \(\boldsymbol{M} = \boldsymbol{UDU}^T\) represent the singular value decompostion of \(\boldsymbol{M}\).
Partition \(\boldsymbol{U}\) such that \(\boldsymbol{U} = \left[ \boldsymbol{U}_1 \vdots \, \boldsymbol{U}_2 \right]\) with \(\boldsymbol{U}_1 \in \mathbb{R}^{p \times q}\), where \(q = 1, 2, \dots , p\).
Project \(\boldsymbol{X}_i\) to \(q\) dimensions by taking \(\boldsymbol{X}_{Ri}^T = \boldsymbol{U}_1^T \boldsymbol{X}_i^T\).
For groups \(\boldsymbol{X}_i\) let \(\boldsymbol{S_i}\) be the sample covariance matrix.
Let \(\widehat{\boldsymbol{M}} := \left[\boldsymbol{S}_2 - \boldsymbol{S}_1 \vdots \ldots \vdots \boldsymbol{S}_i - \boldsymbol{S}_1 \vdots \ldots \vdots \boldsymbol{S}_k - \boldsymbol{S}_1 \right]\) where \(i = 1, \ldots, k\) groups.
Let \(\widehat{\boldsymbol{M}} = \boldsymbol{UDV}^T\) represent the singular value decompostion of \(\widehat{\boldsymbol{M}}\).
Partition \(\boldsymbol{U}\) such that \(\boldsymbol{U} = \left[ \boldsymbol{U}_1 \vdots \, \boldsymbol{U}_2 \right]\) with \(\boldsymbol{U}_1 \in \mathbb{R}^{p \times q}\).
Project \(\boldsymbol{X}_i\) to \(q\) dimensions by taking \(\boldsymbol{X}_{Ri}^T = \boldsymbol{U}_1^T \boldsymbol{X}_i^T\).
The critical value data sets generated for the simulation study have the following characteristics:
\(n_i = 15\)
\(p = 100\)
We generated the \(k\) populations from \(\mathcal{N}_p \left( \boldsymbol{0}, \boldsymbol{\Sigma}_i \right)\).
The number of repetitions is 100,000.
\(cv = inf\{x \in \mathbb{R} : 1 - \alpha \leq \hat{F}_{T}(x) \}\), where \(\alpha = .05\).
The power simulation data sets generated for the simulation study have the following characteristics:
\(n_i = 15\)
\(p = 100\)
We generated the \(k\) populations from \(\mathcal{N}_p \left( \boldsymbol{0}, \boldsymbol{\Sigma}_i \right)\), and \(\mathcal{N}_p \left( \boldsymbol{0}, \boldsymbol{\Sigma}_i * c \right)\).
Number of repetitions is 1,000.
We saw improvements in power with dimension reduction.
It appears the high dimension tests don't do that well with spiked covariance matrices and dimension reduction method doesn't seem to help that much.
Dimension reduction with the modified likelihood ratio test improved the power and appeared to beat the high dimensional tests when dealing with a spiked covariance matrix.
Explore some of the asymptotic properties of the tests little better.
Why does the Data Scatter dimension reduction method appear to do better than the covariance differences method.
Looking at a another nontrivial k \((k \neq 3)\) group situation.
Look at the Wald test with dimension reduction.
Seeing how shrinkage estimators perform with these tests.
Looking at one sample high dimensional tests.
Schott (2007), Srivastava, Yanagihara, and Kubokawa (2014), Ishii, Yata, and Aoshima (2016), Chaipitak and Chongcharoen (2013), Ledoit and Wolf (2004)
Chaipitak, Saowapha, and Samruam Chongcharoen. 2013. “A Test for Testing the Equality of Two Covariance Matrices for High-Dimensional Data.” Journal of Applied Sciences 13 (2): 270–77.
Ishii, Aki, Kazuyoshi Yata, and Makoto Aoshima. 2016. “Asymptotic Properties of the First Principal Component and Equality Tests of Covariance Matrices in High-Dimension, Low-Sample-Size Context.” Journal of Statistical Planning and Inference 170 (March): 186–99.
Ledoit, Olivier, and Michael Wolf. 2004. “A Well-Conditioned Estimator for Large-Dimensional Covariance Matrices.” Journal of Multivariate Analysis 88 (2): 365–411.
Schott, James R. 2007. “A Test for the Equality of Covariance Matrices When the Dimension Is Large Relative to the Sample Sizes.” Computational Statistics & Data Analysis 51 (12): 6535–42.
Srivastava, Muni S., Hirokazu Yanagihara, and Tatsuya Kubokawa. 2014. “Tests for Covariance Matrices in High Dimension with Less Sample Size.” Journal of Multivariate Analysis 130 (September): 289–309.